Multi‐scale attention encoder for street‐to‐aerial image geo‐localization

نویسندگان

چکیده

Abstract The goal of street‐to‐aerial cross‐view image geo‐localization is to determine the location query street‐view by retrieving aerial‐view from same place. drastic viewpoint and appearance gap between images brings a huge challenge against this task. In paper, we propose novel multiscale attention encoder capture contextual information aerial/street‐view images. To bridge domain these two view images, first use an inverse polar transform make approximately aligned with Then, explored applied convert into feature representation guidance learnt information. Finally, global mining strategy enable network pay more hard negative exemplars. Experiments on standard benchmark datasets show that our approach obtains 81.39% top‐1 recall rate CVUSA dataset 71.52% CVACT dataset, achieving state‐of‐the‐art performance outperforming most existing methods significantly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-Scale Image Geolocalization

In this chapter, we explore the task of global image geolocalization— estimating where on the Earth a photograph was captured. We examine variants of the “im2gps” algorithm using millions of “geotagged” Internet photographs as training data. We first discuss a simple to understand nearest-neighbor baseline. Next, we introduce a lazy-learning approach with more sophisticated features that double...

متن کامل

Multiscale Discriminant Saliency for Visual Attention

The bottom-up saliency, an early stage of humans’ visual attention, can be considered as a binary classification problem between center and surround classes. Discriminant power of features for the classification is measured as mutual information between features and two classes distribution. The estimated discrepancy of two feature classes very much depends on considered scale levels; then, mul...

متن کامل

Image denoising and restoration with CNN-LSTM Encoder Decoder with Direct Attention

Image denoising is always a challenging task in the field of computer vision and image processing. In this paper we have proposed an encoder-decoder model with direct attention, which is capable of denoising and reconstruct highly corrupted images. Our model is consisted of an encoder and a decoder, where encoder is a convolutional neural network and decoder is a multilayer Long Short-Term memo...

متن کامل

Recurrent Neural Network Encoder with Attention for Community Question Answering

We apply a general recurrent neural network (RNN) encoder framework to community question answering (cQA) tasks. Our approach does not rely on any linguistic processing, and can be applied to different languages or domains. Further improvements are observed when we extend the RNN encoders with a neural attention mechanism that encourages reasoning over entire sequences. To deal with practical i...

متن کامل

Unsupervised Multiscale Image Segmentation

We propose a general unsupervised multiscale featurebased approach towards image segmentation. Clusters in the feature space are assumed to be properties of underlying classes, the recovery of which is achieved by the use of the mean shift procedure, a robust non-parametric decomposition method. The subsequent classification procedure consists of Bayesian multiscale processing which models the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: CAAI Transactions on Intelligence Technology

سال: 2022

ISSN: ['2468-2322', '2468-6557']

DOI: https://doi.org/10.1049/cit2.12077